Self-assembly of protein amyloids: 
a competition between amorphous and ordered aggregation 
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Protein aggregation in the form of amyloid fibrils has important biological and technological 
implications. Although the self-assembly process is highly efficient, aggregates not in the fibrillar 
form would also occur and it is important to include these disordered species when discussing the 
thermodynamic equilibrium behavior of the system. Here, we initiate such a task by considering a 
mixture of monomeric proteins and the corresponding aggregates in the disordered form (micelles) 
and in the fibrillar form (amyloid fibrils). Starting with a model on the respective binding free 
energies for these species, we calculate their concentrations at thermal equilibrium. We then discuss 
how the incorporation of the disordered structure furthers our understanding on the various amyloid 
promoting factors observed empirically, and on the kinetics of fibrilization. 

PACS numbers: 87.14.em, 82.35.Pq, 05.70.-a 



I. INTRODUCTION 

Amyloids are insoluble fibrous protein aggregations 
stabilized by a network of hydrogen bonds and hydropho- 
bic interactions [l|, 0, [H, El- They are intimately re- 
lated to many neurodegenerative diseases such as the 
Alzheimer's Disease, the Parkinson Disease and other 
prion diseases Better characterization of the various 
properties of amyloid fibrils is therefore of high impor- 
tance for the understanding of the associated pathogen- 
esis. More recently, viewing protein amyloid formation 
as a highly efficient self-assembly process, possible appli- 
cations have also been proposed. For instance, amyloid 
fibrils have been employed as nanowire templates!!, 0] , 
were shown to possess great tensile strength (a, [jj and 
complex phase behavior similar to liquid crystals [ToL[TTI| . 
Given the high importance of protein amyloid in biol- 
ogy and potentially in technology, it is being studied 
intensively. In particular, much effort has been spent 
on investigating the amino-acid dependency on amyloid 
propensity [I, 111 [3 0, [II [H, 111 ; the possibility of 
primary-sequence-based amyloid propensity predictions 



jll Gj, |20|, |21J 
tein amyloid 
tion Hf H3, 



22 . [23| : the mechanical properties of pro- 
1 124. 12511 ; the kinetics of amyloid forma- 
111 i,i,S0; as well as the 



thermodynamical behaviours of the aggregation process 

HiE [Mill. 

Although the protein amyloid self-assembly process is 
highly efficient, aggregates not in the fibrillar form would 
also occur and it is important to include these disordered 
species when discussing the thermodynamic equilibrium 
behavior of the system. This motivates us to consider 
here a system consisting of a mixture of monomers, ag- 
gregates with a linearly ordered structure (fibrils) and 
aggregates with a disordered structure (a micelle-like ag- 
gregate) (c.f. Fig.[T]). Starting with a discussion on their 



FIG. 1: (Color online) Schematic diagrams of the three species 
considered in this paper: (a) a monomeric protein in solu- 
tion; (b) a micelle, or an amorphous aggregate; (c) an eight- 
monomer segment of an amyloid fibril consisting of two cross- 
beta structures (one cross-beta structure is coloured, the other 
is black). The hydrogen bonds stabilise the beta sheets in the 
vertical direction (not shown in this figure). (Drawn with 
Deep View [H.) 
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respective binding free energies, we deduce the concentra- 
tions for the various species at thermal equilibrium, and 
consider the experimental implications of our investiga- 
tion. In particular, we study the effect of temperature 
and pressure variations on the average fibrillar length. 
We then discuss how our work relates to the empirically 
observed variation in amyloid propensity with respect to 
the primary sequences of the proteins. Finally, we em- 
ploy the formalism developed to study the kinetic process 
of aggregation. 

The plan of the paper is as follows: In Section II, we 
introduce our model of a amyloid-forming self-assembly 
system. In Section III, we discuss the experimentally 
relevant predictions from our model. In Section IV, we 
consider how our findings relate to empirical observations 
on amyloid propensity. In Section V, we investigate the 
kinetic process of self-assembly from the perspective of 
our free energy picture. 
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II. THE MODEL 

In this work, we are primarily concerned with amyloid 
fibrilization of short peptides. Peptides interact via an 
array of interactions, such as hydrophobic interactions, 
hydrogen bonding^lectrostatic interactions, etc (for a 
review, see, e.g., |4(], El). ^ ue to these interactions, 
aggregation may occur and we consider here two differ- 
ent types of aggregates: i) linearly structured aggregates 
(amyloid fibrils) and ii) disordered aggregates (micelles) 
(c.f. Fig. Q]). For the micellar species, we assume that 
there is an optimal configuration consisting of M pro- 
teins, where M is in the order of tens (62J. For the fib- 
rillar species, we assume that the only ordered structure 
is a two-tape structure, i.e., each fibril consists of stack- 
ing two cross-beta structures (c.f. Fig. 0Jc)). We note 
that amyloid fibrils can exhibit structural variations even 
when prepared under the same condition, and the precise 
structural details will be highly primary sequence depen- 
dent (see, e.g., [13]). 

Now a note on terminology: we will call a free protein 
in solution a monomer, and a fibril consisting of i pro- 
teins a i-mer fibril. We will also denote from now on the 
numbers of monomers, micelles and z-mer fibril in a solu- 
tion of volume V by , and respectively. In 
particular, if N denotes the total number of monomers, 
we have 



N {a) + MN {b) + iN i C) = N 



(1) 



Given the three different species: monomers, micelles 
and fibrils, we are interested in determining their respec- 
tive volume fractions given an initial volume fraction C . 
In this work, we will always set the unit of volume to 
be the volume of one monomer. With this convention, 
C = N/V where V is the volume of the system. 

To calculate the relative abundances for the various 
species, we firstly need to obtain their respective species- 
specific Binding Free Energy (BFE). Without loss of gen- 
erality, we will set the monomeric BFE to zero, and de- 
note the micellar BFE by 7, where 



7 = —TAsb + Ae b + pAv b 



(2) 



In the above equation, As a , Ae b and Adj, are the en- 
tropic, binding energy and volumic differences between 
the monomers and the micelles. In other words, Asj 
quantifies the free energy contribution from the loss in 
configurational freedom due to the rigidity of the aggre- 
gates, Act, quantifies the change in internal energy re- 
sulting from the various inter-protein and intra-protein 
interactions, and Avb denotes the change in volume for a 
monomer as a result of being part of a larger aggregate. 

For the fibrillar species, we will denote the BFE of an 
infinitely long fibril as: 



TAs c + Ae c +pAv c 



(3) 



The terms in the R.H.S. above have similar definitions 
as in the micellar case aforementioned. For a finite-size 



i-mer fibril, we will make the following assumption typ- 
ically made in the study of linearly aggregating systems 
(e.g., see @): 



fi = ( ; I foo j 



(4) 



where £ (£ > 0) accounts for the boundary effect at the 
fibril's ends and is of order one [43|. For instance, it 
accounts for the loss of hydrophobic interactions at both 
ends of a fibril. 

With the BFE defined, we can calculate the concen- 
trations of the various species by finding the minimum 
of the total free energy of the system. We will start by 
writing the total partition function as (44[: 

_ ' {AV)» W {BV)» W 
where, by the previous discussion on the BFE [63| . 



A = 1 

B = exp(-Mj/k B T) 
C) = eM-(i-Ofoo/k B T] 



(6) 
(7) 
(8) 



Note that the prime in the product in Eq. ((5|) denotes the 
restriction that the total number of peptides is conserved 
(c.f. Eq. ([I])). The distribution of the various species 
can now be obtained by determining the minimum of the 
total free energy density, 



^tot — 



k B T logQt 
V 



(9) 



subject to the constraint shown in Eq. ([TJ) . This opti- 
mization problem can be solved by the Lagrange multi- 
plier method and the results are (see, e.g., [45j1: 



,(*-) 



n (a) e -y/k B T 



M 



n (a) e -(i-£)fco/ik B T 



(10) 

(11) 



The lower case n denotes the volume fraction of the cor- 
responding species, i.e., = N^/V. 

For the micellar species, due to the magnitude of M 
(M > 10 [56]), if C < exp{j/k B T), the micellar volume 
fraction will be negligible in comparison to the monomer 
volume fraction; conversely, if C > exp(-//k B T), then the 
monomeric volume fraction will be exjp("//k B T) and all 
the excess monomers will be in the micellar form, i.e., 
n ( b ) ~ (J — exp(j/k B T) [43]. It is therefore legitimate to 
define a critical concentration at C cr it = exp(7 /k B T). 
We will call this the Critical Micellar Concentration 
(CMC). For the fibrillar species, a similar reasoning in- 
dicates that the critical concentration for fibrilization is 
at Cdit = exp(/ 00 /fcsT). We will call this the Critical 
Fibrillar Concentration (CFC). 
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If CMC < CFC and C > CMC, CFC, almost all 
monomers would be in the micellar form and the concen- 
trations of monomers and fibrils are negligible by com- 
parison. On the other hand, if C > CFC, CMC and 
CFC < CMC, then the concentrations of monomers and 
micelles will be negligible while the concentration of the 
fibrillar species will be abundant. In this fibril-dominant 
regime, nf^ follows the following distribution (43j: 

n\ c) = expH/L + Uoo/k B T] (12) 

where L is the average number of monomers in a fibril 
such that 

L= (iN< c) ) = VCe-«/~/ fc « T . (13) 

Since a fibril is a linear structure, the average fibrillar 
length is thus proportional to L. According to Eq. (fTB")) . 
the average fibril length scales with \fC . This fact is ob- 
served in other linear aggregating systems and is a mani- 
festation of the one-dimensional nature of the aggregates 
(43l |46| . The profile of i x n\°' versus i is depicted in 
the inset plot in Fig. [2j This analytical result is quali- 
tatively confirmed by the experimental observations on 
/3-lactoglobulin amyloid fibrils (47l . l48j . 

We will now try to estimate the magnitudes of the 
terms appearing in Eq. ([3]). For the first term, let us 
assume that a protein in monomeric form is in the de- 
natured state, and a fibrilized protein corresponds to the 
folded state. It has been experimentally and theoretically 
estimated that, by going from the denatured state to the 
folded state, a protein loses on average around fee hi 10 
per amino acid in entropy E^, [5(| • We will therefore 
estimate As c as —Rks In 10 = —2.3 x Rks where R is 
the number of amino acids in the protein . For the third 
term in Eq. (J3]) , it has been demonstrated that the change 
in the protein's volume upon folding is very small [5 if . 
Indeed, it is found that the change in volume per amino 
acid upon folding is in the order of 0.01 nm 3 [5 if . which 
suggests that pAv c ~ 0.07 x RksT at atmospheric pres- 
sure. It is therefore negligible in comparison to the en- 
tropic contribution. The second term in Eq. ([3]) involves 
a combination of interactions, such as hydrogen bond- 
ing, hydrophobic interactions, electrostatic interactions, 
etc, among which hydrophobic interactions, which are of 
the order of a few fceT per amino acid, are believed to 
be dominant fill . |52| . Since hydrophobic interactions in- 
volves effective burying of hydrophobic side-chains inside 
the protein structure, it indicates the need for a multi- 
layered fibrillar structure (such as our two-tape model 
employed here), as universally observed in amyloid fib- 
rils formed from different proteins [1]. 

III. EXPERIMENTAL IMPLICATIONS 

We will now focus on the fibrillar phase, i.e., we are in 
the scenario where C > CFC and CFC < CMC. Accord- 



FIG. 2: (Color online) Dominance diagram of the three- 
species system at concentration higher than the critical con- 
centrations: CMC and CFC. The coloured arrows depict how 
the dominance may shift under increase in hydrophobicity 
(red), increase in the number of aromatic side chains (blue), 
increase in alternating hydrophobic-hydrophilic amino-acid 
sequence (green), and increase in unpaired charges in the 
side chains (black). Inset plot: The volume fractions of i- 
mer fibrils versus i in the fibrillar phase (i.e., C S> CFC and 
CMC > CFC), where L = (iN$ c) ) is set to be 500. 




ing to Eq. JT3D : 



2k B T 2 



If we equate a monomeric protein to a denatured pro- 
tein, and a fibrilized protein to a folded protein, then 
experimental work indicates that foo/T is a concave up 
function with respect to T such that the minimum occurs 
at around 20° C [53|,[54j]. This suggests that in an isobaric 
experiment, the average fibril length would first increase 
and then decrease as temperature increases. 

The situation for pressure variation is more compli- 
cated due to the fact that the compressibility differs for 
different amino acids. Nevertheless, it has been found 
generally that at low pressure (~ 1 atm), the change in 
volume upon folding is small while the change is pos- 
itive at very high pressure (7500 atm [55| ) due to the 
fact that denatured protein has greater compressibility 
[Ud^]. In other words, if we again equate a monomeric 
protein to a denatured protein, and a fibrilized protein to 
a folded protein, we would expect that, in the very high 
pressure regime, an increase in pressure would lead to an 
exponential decrease in the average fibrillar length in an 
isothermal experiments. 
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IV. RELEVANCE TO PREVIOUS EMPIRICAL 
FINDINGS 

As discussed in Sect. HI if C > CMC, CFC, the dom- 
inant species in the system will be the one with a lower 
critical concentration. Namely, in terms of BFE, the 
dominant species will be fibrillar if < 7, and vice 
versa. We will now discuss how the primary sequence 
may affect amyloid propensity in terms of the BFE. To 
simplify the discussion, we will assume that substituting 
an amino acid affects predominantly the binding energy 
term, Ae, in the BFE (c.f. Fig. EJ). 

As a result of empirical observations [l||, [lj| H3, HH, 
I22I f23j ] . it is generally agreed that the following fac- 
tors promote amyloid formation: i) an increase in hy- 
drophobicity, and ii) an increase in length of an alternat- 
ing hydrophobic-hydrophilic amino-acid sequence; while 
it is found that an increase in the number of charged 
amino acids decreases amyloid propensity. In terms of 
the binding energies, an increase in hydrophobicity would 
decrease both Ae& and Ae c and as such would on average 
increase the fibrilization probability if the protein's pa- 
rameters are already close to the monomer-fibril bound- 
ary in the dominance diagram (the red arrow in the 
Fig. [2]). For our two-tape model for the amyloid fibril 
(c.f. Fig. [He)), an increase in alternating hydrophobic- 
hydrophilic amino-acid sequence would allow for packing 
the hydrophobic side chains inside the cross-beta sheet 
structure, while having the hydrophilic side chains out- 
side, this would decrease Ae c . On the other hand, having 
such a pattern would conceivably decrease the average 
energy gained inside a micellar structure given the amor- 
phous structural nature, i.e., Aet will be increased. Such 
a modification would therefore increase amyloid propen- 
sity (the green arrow in the Fig. If there is an in- 
crease in paired charges in the protein, i.e., charges that 
are not accompanied by ionic bonds, electrostatic inter- 
action would deter aggregation and as such both Aeb and 
Ae c will be increased (the black arrow in the Fig. [2]). 

Another insight we can gain from the above consid- 
eration concerns the importance of aromatic residues in 
amyloid propensity @, III M, El El EE E3- Beside 
the heightened hydrophobicity in aromatic residues, the 
offsetted 7r-stacking interaction is directional along the 
fibrillar axis [H, |59j], hence e c may be decreased more 
than eb (the blue arrow in the Fig. [5]). This suggests 
that aromatic interaction, or any interactions directional 
along the fibrillar axis, contributes to amyloid stability in 
a way different from ordinary hydrophobic interactions. 



V. KINETICS 

We discuss now how the picture developed in this pa- 
per helps to describe the kinetics of the protein amyloid 
self-assembly process investigated experimentally. Ac- 
cording to the model proposed in [57] , the series of events 
leading up to the fibrilization of amyloid-/? proteins is de- 



FIG. 3: (Color online) (a) A schematic digram depicting the 
amyloid-beta self-assembly process proposed in [53]. The cir- 
cle denotes the free monomeric state, the square denotes the 
typical micellar state (the micellar size, M, is estimated to be 
25 [53]), and the triangle denotes a stable nucleus (the nucleus 
size is estimated to be 10 [13|). The thick arrow depicts the 
fast pathway from free monomer to micelles and the thin ar- 
row depicts the slow process of nucleation from micelles. The 
broken arrow depicts that very slow process of nucleation from 
free monomers, which is out of the range of experimental time 
scale, but may play an important role in actual pathogenesis 
under physiological time scale, (b) The temporal evolution of 
monomer concentration. Upper plot: When C > CMC, the 
monomers are quickly converted into micelles and then slowly 
into fibrils. The figures above the curve depict the dominant 
species in the solution as time progresses. Lower plot: When 
CFC < C < CMC, the proteins remain in monomeric form 
for a time longer than can be probed experimentally. These 
two plots show the curious phenomenon of the possibility of 
ending up with a lower monomeric concentration when the 
initial concentration is higher. 




picted in Fig. [3] In this scenario, the direct pathway from 
monomers to stable nucleus (depicted by the broken ar- 
row in Fig. [3]) is in a time scale too long to be probed 
experimentally. Therefore, the only possible fibrilization 
pathway is for the monomers to first formed micelles (a 
fast process, depicted by the thick black arrow), stable 
nuclei are then formed out of the micelles (a slow pro- 
cess, depicted by the thin black arrow). Based on this 
model, within the temporal constraint of experiments, 
fibrilization is only possible if C > CMC (c.f. Fig.^b)). 
In the case of the amyloid-/? protein, the CMC has been 
measured to be in the order of 10 fiM [60j. This is sub- 
stantially higher than the concentration of amyloid-beta 
in the cerebral spinal fluid, which is in the subnanomolar 
concentration range [6l| . It therefore poses the question 
are current experimental methods only probing the fast 
pathway - monomers to micelles to nucleus (depicted by 
the two solid arrows), while the physiologically relevant 
pathway is the slow pathway - monomers to nucleus (de- 
picted by the broken arrow). 



5 



VI. CONCLUSION 

In this work, we have considered the thermodynamic 
equilibrium behavior of a system with a mixture of 
monomeric proteins, the corresponding micellar aggre- 
gates and fibrillar aggregates. We have deduced the con- 
centrations of these species at thermal equilibrium and 
we have found that the average fibrillar length is very 
sensitive to temperature or pressure variation. We have 
also discussed the relevance of our investigation to pre- 
vious empirical findings and to the understanding of the 



kinetical process of fibrilization. 
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